30 research outputs found

    Transparent CheckpointRestart of Multiple Processes on Commodity Operating Systems

    No full text
    The ability to checkpoint a running application and restart it later can provide many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing and improved service availability. However, applications often involve multiple processes which have dependencies through the operating system. We present a transparent mechanism for commodity operating systems that can checkpoint multiple processes in a consistent state so that they can be restarted correctly at a later time. We introduce an efficient algorithm for recording process relationships and correctly saving and restoring shared state in a manner that leverages existing operating system kernel functionality. We have implemented our system as a loadable kernel module and user-space utilities in Linux. We demonstrate its ability on real-world applications to provide transparent checkpoint-restart functionality without modifying, recompiling, or relinking applications, libraries, or the operating system kernel. Our results show checkpoint and restart times 3 to 55 times faster than OpenVZ and 5 to 1100 times faster than Xen.

    Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems

    No full text
    We present Scribe, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Rendezvous points make a partial ordering of execution based on system call dependencies sufficient for replay, avoiding the recording overhead of maintaining an exact execution ordering. Sync points convert asynchronous interactions that can occur at arbitrary times into synchronous events that are much easier to record and replay. We have implemented Scribe without changing, relinking, or recompiling applications, libraries, or operating system kernels, and without any specialized hardware support such as hardware performance counters. It works on commodity Linux operating systems, and commodity multi-core and multiprocessor hardware. Our results show for the first time that an operating system mechanism can correctly and transparently record and replay multi-process and multi-threaded applications on commodity multiprocessors. Scribe recording overhead is less than 2.5 % for server applications including Apache and MySQL, and less than 15 % for desktop applications including Firefox, Acrobat, OpenOffice, parallel kernel compilation, and movie playback

    Teaching Operating Systems Using Virtual Appliances and Distributed Version Control

    No full text
    Students learn more through hands-on project experience for computer science courses such as operating systems, but providing the infrastructure support for a large class to learn by doing can be hard. To address this issue, we introduce a new approach to managing and grading operating system homework assignments based on virtual appliances, a distributed version control system, and live demonstrations. Our solution is easy to deploy and use with students ’ personal computers, and obviates the need to provide a computer laboratory for teaching purposes. It supports the most demanding course projects, such as those that involve operating system kernel development, and can be used by both on-campus and remote distance learning students even with intermittent network connectivity. Our experiences deploying and using this solution to teach operating systems at Columbia University show that it is easier to use, more flexible, and more pedagogically effective than other approaches
    corecore